Universal consistency and minimax rates for online Mondrian Forests

نویسندگان

  • Jaouad Mourtada
  • Stéphane Gaïffas
  • Erwan Scornet
چکیده

We establish the consistency of an algorithm of Mondrian Forest [LRT14, LRT16], a randomized classification algorithm that can be implemented online. First, we amend the original Mondrian Forest algorithm proposed in [LRT14], that considers a fixed lifetime parameter. Indeed, the fact that this parameter is fixed actually hinders statistical consistency of the original procedure. Our modified Mondrian Forest algorithm grows trees with increasing lifetime parameters λn, and uses an alternative updating rule, allowing to work also in an online fashion. Second, we provide a theoretical analysis establishing simple conditions for consistency. Our theoretical analysis also exhibits a surprising fact: our algorithm achieves the minimax rate (optimal rate) for the estimation of a Lipschitz regression function, which is a strong extension of previous results [AG14] to an arbitrary dimension.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mondrian Forests: Efficient Online Random Forests

Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excellent candidates for real-world prediction tasks. The most popular random forest variants (such as ...

متن کامل

The Mondrian Kernel

We introduce the Mondrian kernel, a fast random feature approximation to the Laplace kernel. It is suitable for both batch and online learning, and admits a fast kernel-width-selection procedure as the random features can be re-used efficiently for all kernel widths. The features are constructed by sampling trees via a Mondrian process [Roy and Teh, 2009], and we highlight the connection to Mon...

متن کامل

On the Minimax Optimality of Block Thresholded Wavelets Estimators for ?-Mixing Process

We propose a wavelet based regression function estimator for the estimation of the regression function for a sequence of ?-missing random variables with a common one-dimensional probability density function. Some asymptotic properties of the proposed estimator based on block thresholding are investigated. It is found that the estimators achieve optimal minimax convergence rates over large class...

متن کامل

Mondrian Forests for Large-Scale Regression when Uncertainty Matters

Many real-world regression problems demand a measure of the uncertainty associated with each prediction. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimat...

متن کامل

The Mondrian Process for Machine Learning

This report is concerned with the Mondrian process [1] and its applications in machine learning. The Mondrian process is a guillotine-partition-valued stochastic process that possesses an elegant self-consistency property. The first part of the report uses simple concepts from applied probability to define the Mondrian process and explore its properties. The Mondrian process has been used as th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017